Robust classification of salient links in complex networks 
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Complex networks in natural, social, and technological systems generically exhibit an abundance of 
rich information. Extracting meaningful structural features from data is one of the most challenging 
tasks in network theory. Many methods and concepts have been proposed to address this problem 
such as centrality statistics, motifs, community clusters, and backbones, but such schemes typically 
rely on external and arbitrary parameters. It is unknown whether generic networks permit the 
classification of elements without external intervention. Here we show that link salience is a robust 
approach to classifying network elements based on a consensus estimate of all nodes. A wide range 
of empirical networks exhibit a natural, network-implicit classification of links into qualitatively 
distinct groups, and the salient skeletons have generic statistical properties. Salience also predicts 
essential features of contagion phenomena on networks, and points towards a better understanding 
of universal features in empirical networks that are masked by their complexity. 



I. INTRODUCTION 

Many systems in physics, biology, social science, eco- 
nomics, and technology are best modeled as a collection 
of discrete elements that interact through an intricate, 
complex set of connections. Complex network theory, a 
marriage of ideas and methods from statistical physics 
and graph theory, has become one of the most successful 
frameworks for studying these systems and has led 
to major advances in our understanding of transporta- 
tion |MTT]. ecological systems jT2 [TB], social and com- 
munication networks [14] , and metabolic and gene regu- 
latory pathways in living cells [T31 - fT7] . 

One of the challenges in complex network research is 
the identification of essential structural features that are 
typically masked by the network's topological complex- 
ity [11 |5J [TBH5U] . Reducing a large-scale network to its 
core components, filtering redundant information, and 
extracting essential components are not only critical for 
efficient network data management. More importantly, 
these methods are often required to better understand 
evolutionary and dynamical processes on networks and to 
identify universal principles of network design or growth. 
In this context, the notion of centrality measures accord- 
ing to which nodes or links can be ranked is fundamental 
and epitomized by the node degree fc, the number of di- 
rectly connected neighbors of a node. Many systems, 
ranging from human sexual contacts |21| to computer 
networks [22], are characterized by a power-law degree 
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distribution ^ k~''^~^^^ with an exponent < /3 < 2. 
These networks are scale-free [23_, meaning the majority 
of nodes are weakly connected and dominated by a few 
strongly connected nodes, known as hubs. Although a 
variety of networks can be understood in terms of their 
topological connectivity (the set of nodes and links), a 
number of systems are better captured by weighted net- 
works in which links carry weights w that quantify their 
strengths [5] [21] • An important class of networks ex- 
hibit both a scale-free degree distribution and broadly 
distributed weights which in some cases follow a power- 
law - ui-(i+"), with 1 < a < 3 [25H?7| - In addition 
to hubs, these networks thus possess highways. Several 
representative networks of this class are depicted in Fig- 
ure [T] Understanding the essential underlying structures 
in these networks is particularly challenging because of 
the mix of link and node heterogeneity. 

Although classifications of network elements accord- 
ing to degree, weight, or other centrality measures have 
been employed in many contexts 30.7 32j , this approach 
comes with several drawbacks. The qualitative con- 
cepts of hubs and highways suggest a clear-cut, network- 
intrinsic categorization of elements. However, these cen- 
trality measures are typically distributed continuously 
and generally do not provide a straightforward separa- 
tion of elements into qualitatively distinct groups. At 
what precise degree does a node become a hub? At what 
strength does a link become a highway? Despite sig- 
nificant advances, current state-of-the-art methods rely 
on system-specific thresholds, comparisons to null mod- 
els, or imposed topological constraints [0 [TT] [55H55] . 
Whether generic heterogeneous networks provide a way 
to intrinsically segregate elements into qualitatively dis- 
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Table I: Statistical features of the full empirical networks and their high-salience skeletons. Statistics for the full 
networks include number of nodes N, link density p = 2L/{N'^ — N) (where L is the number of links), mean node degree 
(fe), coefficients of variation of node degree CV(fc) and link weight CV(ui), and the assortativity coefficient r |28| . For the 
high-salience skeletons, the first column lists the percentage of links from the full network that are also in the HSS, an estimate 
of the scaling exponent |29| /3hss and the assortativity coefficient rnss • Further information on network statistics are provided 
in Supplementary Table SI. 



tinct groups remains an open question. In addition to 
this fundamental question, centrality thresholding is par- 
ticularly problematic in heterogeneous networks since key 
properties of reduced networks can sensitively depend on 
the chosen threshold. 



Here we address these problems by introducing the 
concept of link salience. The approach is based on an en- 
semble of node-specific perspectives of the network, and 
quantifies the extent to which a consensus among nodes 
exists regarding the importance of a link. We show that 
salience is fundamentally different from link betweenness 
centrality and that it successfully classifies links into dis- 
tinct groups without external parameters or thresholds. 
Based on this classification we introduce the high-salience 
skeleton (HSS) of a network and compute this structure 
for a variety of networks from transportation, biology, so- 
ciology, and economics. We show that despite major dif- 
ferences between representative networks, the skeletons 
of all networks exhibit similar statistical and topological 
properties and significantly differ from alternative back- 
bone structures such as minimal spanning trees. Analyz- 
ing traditional random network models we demonstrate 
that neither broad weight nor degree distributions alone 
are sufficient to produce the patterns observed in real net- 
works. Furthermore, we provide evidence that the emer- 
gence of distinct link classes is the result of the interplay 
of broadly-distributed node degrees and link weights. We 
demonstrate how a static and deterministic analysis of a 
network based on link salience can successfully predict 
the behavior of dynamical processes. We conclude that 
the large class of networks that exhibit broad weight and 
degree distributions may evolve according to fundamen- 
tally similar rules that give rise to similar core structures. 



II. RESULTS 
A. Link salience 

Weighted networks like those depicted in Figure [T] can 
be represented by a symmetric, weighted N x N matrix 
W where N is the number of nodes. Elements Wij > 
quantify the coupling strength between nodes i and j. 
Depending on the context, Wij might reflect the passen- 
ger flux between locations in transportation networks, 
the synaptic strength between neurons in a neural net- 
work, the value of assets exchanged between firms in a 
trade network, or the contact rate between individuals in 
a social network. 

Our analysis is based on the concept of effective prox- 
imity dij defined by the reciprocal coupling strength 
dij — l/wij. Effective proximity captures the intuitive 
notion that strongly (weakly) coupled nodes are close to 
(distant from) each other [36]. It also provides one way to 
define the length of a path V that connects two terminal 
nodes {ni,nK) and consists of i^T — 1 legs via a sequence 
of intermediate nodes rii, and connections Wmm+i > 0. 
The shortest path minimizes the total effective distance 
I = ^^^i^ dmm+i and can be interpreted as the most 
efficient route between its terminal nodes [571 [55] ; this 
definition of shortest path is used throughout this pa- 
per. In networks with homogeneous weights, shortest 
paths are typically degenerate, and many different short- 
est paths coexist for a given pair of terminal nodes. In 
heterogeneous networks with real- valued weights shortest 
paths are typically unique. For a fixed reference node r, 
the collection of shortest paths to all other nodes defines 
the shortest-path tree (SPT) T(r) which summarizes the 
most effective routes from the reference node r to the 
rest of the network. T{r) is conveniently represented by 
a symmetric N x N matrix with elements tij{r) = 1 if 
the link {i,j) is part of at least one of the shortest paths 
and tij(r) = if it is not. 

The central idea of our approach is based on the no- 
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Figure 1: Generic statistical properties of heterogeneous complex networks, (a) Geographic representation of the 
worldwide air traffic network (top), black dots represent airports, links represent passenger flux between them, link weights Wij 
are color encoded from dark (weak) to white (strong). Networks on the lower left and right represent the Florida bay food web 
and the world trade network, respectively. Nodes in the food web are species and links represent the exchange of biomass; in the 
trade network nodes are countries and links quantify exchange in assets measured in United States dollars (USD), (b) Relative 
frequencies f{w) = (w) p{w/ {w}) and p{b) of link weights w and link betweenness b of representative transportation, biological, 
ecological, social, and economic networks. Link weights are normalized by the mean weight (w). Details on each network are 
provided in Methods. In all networks link weights and betweenness are distributed across many orders of magnitude, and both 
statistics exhibit heavy tails. The substantial variability in these quantities is also reflected in their coefficient of variation (see 
Table |l]|. 



tion of the average shortest-path tree as illustrated in 
Figure |2^. We define the salience S* of a network as 

S = (T) = ^J2^ik) (1) 

k 

SO that S" is a linear superposition of all SPTs. S can be 
calculated efficiently using a variant of a standard algo- 
rithm (see Supplementary Methods). According to this 
definition the element < Sij < 1 of the matrix S quan- 
tifies the fraction of SPTs the link participates in. 
Since T{r) refiects the set of most efficient paths to the 
rest of the network from the perspective of the reference 
node, Sij is a consensus variable defined by the ensem- 
ble of root nodes. If Sij = 1 then link is essential 
for all reference nodes, if Sij ~ the link plays no role 
and if, say, Sij = 1/2 then link is important for 

only half the root nodes. Note that although S is de- 
fined as an average across the set of shortest-path trees, 
it is itself not necessarily a tree and is typically different 



from known structures such as minimal spanning trees 
(see Supplementary Figure SI, Supplementary Table S3 
and Supplementary Methods). 



B. Robust classification of links 

The most important and surprising feature of link 
salience is depicted in Figure |2};. For the representa- 
tive set of networks, we find that the distribution p{s) of 
link salience exhibits a characteristic bimodal shape on 
the unit interval. The networks' links naturally accumu- 
late at the range boundaries with a vanishing fraction at 
intermediate values. Salience thus successfully classifies 
network links into two groups: salient (s « 1) or non- 
salient (s « 0), and the large majority of nodes agree on 
the importance of a given link. Since essentially no links 
fall into the intermediate regime, the resulting classifica- 
tion is insensitive to an imposed threshold, and is an in- 
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Figure 2: Computation of link salience and properties of the high-salience skeleton, (a) For each reference node 
r in the weighted network on the left the shortest-path tree T(r) is computed. The superposition of all trees according to 
Equation |T| assigns a value Sij to each link in the original network. Salience values are shown on the right with link color: 
red is high salience and grey is low. (b) The collection of high salience links (red) for the networks shown in Figure [l] The full 
networks are shown in grey, (c) The relative frequency p{s) of non-zero salience values s. The distribution p{s) is bimodal in all 
networks under consideration. This key feature of bimodality of p(s) provides a plausible, parameter-insensitive classification 
of links, salient (s « 1) vs. non-salient (s « 0), and implies that nodes in these networks typically agree whether a link is 
essential or not. The high-salience skeleton (HSS) is defined as the collection of links that accumulate near s « 1. Upper and 
lower insets depict, respectively, the degree distribution p(k) of the HSSs and mean next-neighbor degree (fc„„|fc) as a function 
of node degree k. The HSS degree distribution is typically scale-free (see Supplementary Figure S2) and the skeletons are 
typically strongly disassortative. Note that although they may be, and often are, divided into multiple components, the largest 
connected component of the skeleton typically dominates. This connectedness is not imposed, but is an emergent property of 
salience. (See Supplementary Table S2). 



trinsic and emergent network property characteristic of a 
variety of strongly heterogeneous networks. This is fun- 
damentally different from common link centrality mea- 
sures such as weight or betweenness that possess broad 
distributions (see Fig. ^p), and which require external 
and often arbitrary threshold parameters for meaningful 
classifications |34l [55] . 



The salience as defined by Eq. [T] permits an intuitive 
definition of a network's skeleton as a structure which 
incorporates the collection of links that accumulate at 
s w 1. Figure [2]d depicts the skeleton for the networks of 
Figure [iji. For all networks considered, only a small frac- 
tion of links are part of the high-salience skeleton (6.76% 
for the air traffic network, 6.5% for the food web, and 



2.39% for the world trade network), and the topologi- 
cal properties of these skeletons are remarkably generic. 
Note that technically a separation of links into groups ac- 
cording to salience requires the definition of a threshold 
(e.g. we chose the center of the salience range for con- 
venience). The important feature is that the resulting 
groups are robust against changes in the value, since al- 
most no links fall into intermediate ranges. Consequently 
the point of separation is almost arbitrary, yield almost 
identical skeletons for threshold ranges of 80% of the en- 
tire range. One of the common features of these skele- 
tons is their strong disassortativity, irrespective of the 
assortativity properties of the corresponding original net- 
work (see Table |l ) . Furthermore, all skeletons exhibit a 
scale-free degree distribution 



PHSs(fc) - fc-(l+fe«^) 



(2) 



with exponents 1.1 < /3hss < 2.5 (see Table [T] and Sup- 
plementary Figure S2). Since only links with s ~ 1 are 
present in the HSS, the degree of a node in the skele- 
ton can be interpreted as the total salience of the node. 
The collapse onto a common scale-free topology is par- 
ticularly striking since the original networks range from 
quasi-planar topologies with small local connectivity (the 
commuter network) to completely connected networks 
(worldwide trade). Note that the lowest exponent (weak- 
est tail) is observed for the commuter network, since in 
a quasi-planar network the maximum number of salient 
connections is limited by the comparatively small degree 
of the original network. The scale- free structure of the 
HSS consequently suggests that networks that possess 
very different statistical and topological properties and 
that have evolved in a variety of contexts seem to self- 
organize into structures that possess a robust, disassor- 
tative backbone, despite their typical link redundancy. 

Although these properties of link salience are encour- 
aging and suggest novel opportunities for filtering links 
in complex weighted networks, for understanding hidden 
core sub-structures, and suggest a new mechanism for 
defining a network's skeleton, a number of questions need 
to be addressed and clarified in order for the approach to 
be viable. First, a possible criticism concerns the defini- 
tion of salience from shortest-path trees which suggests 
that Sij can be trivially obtained from link betweenness 
bij , for example by means of a non-linear transform. Sec- 
ondly, a bimodal p{s) may be a trivial consequence of 
broad weight distributions, if for instance large weights 
are typically those with s w 1. Finally, the observed 
bimodal shape of p{s) could be a property of any non- 
trivial network topology such as simple random weighted 
networks. In the following we will address each of these 
concerns. 



C. Salience and betweenness 

The betweenness bij of a link (i,j) is the fraction 
of all ~ N'^ shortest paths that pass though the link. 



whereas the salience Sij is the fraction of TV shortest- 
path trees T(r) the link is part of. Despite the appar- 
ent similarity between these two definitions, both quan- 
tities capture very different qualities of links, as illus- 
trated in Figure [3] Betweenness is a centrality mea- 
sure in the traditional sense [40], and is affected by the 
topological position of a link. Networks often exhibit a 
core-periphery structure |41| and the betweenness mea- 
sure assigns greater weight to links that are closer to 
the barycenter of the network |39]. Salience, on the 
other hand, is insensitive to a link's position, acting as 
a uniform filter. This is illustrated schematically in the 
random planar network of Figure |3^. High betweenness 
links tend to be located in the center of the planar disk, 
whereas high salience links are distributed uniformly. A 
given shortest path is more likely to cross the center of 
the disk, whereas the links of a shortest-path tree are 
uniformly distributed, as they have to span the full net- 
work by definition. A detailed mathematical comparison 
of betweenness and salience is provided in the Methods. 
Figure [Sj: depicts the typical relation of betweenness and 
salience in a correlogram for the worldwide air traffic net- 
work. The data cloud is broadly distributed within the 
range of possible values given by the inequalities (see Sup- 
plementary Methods) 



s/N <b< 



(3) 



Within these bounds no functional relationship between 
b and s exists. Given a link's betweenness b one generally 
cannot predict its salience and vice versa. In particular, 
high-salience links (s « 1) possess betweenness values 
ranging over many scales. The spread of data points 
within the theoretical bounds is typical for all the net- 
works considered (see Supplementary Figure S3). Links 
tend to collect at the right-hand edge, corresponding to 
the upper peak in salience, and in particular at the lower 
right corner of the wedge-shaped region, corresponding 
to the heretofore-unexplained peak in betweenness ex- 
hibited by several of the networks (cf . Figure [l] and the 
dashed line in Figure ^jp). These edges have maximal 
salience (all nodes agree on their importance) but the 
smallest betweenness possible given this restriction (they 
are not well-represented in the set of shortest paths). 
Such edges are the spokes in the hub-and-spoke structure: 
they connect a single node to the rest of the network, but 
are used by no others, and they are an essential piece of 
the high-salience skeleton, since severing them removes 
some node's best link to the main body of the network. 
The presence of such links in the high-salience skeleton 
explains why the weight values of s w 1 edges span such 
a wide range, since a link may have relatively low weight 
and yet be some node's most important connection. 

Figure [3]i tests the hypothesis that strong link weights 
may yield strong values for salience. We observe that link 
betweenness is positively correlated with link weight and 
roughly follows a scaling relation w ^ K' with 7 k, 0.2, 
in agreement with previous work on node centrality [42] . 
This is not surprising since high- weight links are by defi- 
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Figure 3: Salience and betweenness capture different 
10 aspects of centrality. (a) A schematic planar network in 
which the color of links quantifies betweenness b (left) and 
salience s (right). High-betweenness links tend to be located 
near the barycenter of the network |39j , whereas high-salience 
2 links are distributed evenly throughout the network. (b) 
^0 A simple linear chain shows the reason for this effect. A 
link in the center serves as a shortest-path bridge between 
all pairs of nodes, and so has the highest betweenness. But 
since all shortest-path trees are identical, all links have iden- 
tical salience, (c) A scatter plot (red dots) of link salience 
s versus link betweenness b for the air traffic network (point 
density is quantified in grey). The vertical dotted line marks 
s — 1/2 and the solid curves represent the theoretical bounds 
of Equation (3|. The projected density p{b) is shown on the 
left. The lack of any clear correlation in the scatter plot is 
10 typical of all networks in Figure [l] (See Supplementary Fig- 
ure S3 for additional correlograms.) (d) Scatter plots (in 
light red) of betweenness b (left) and salience s (right) ver- 
sus link weight w in the air traffic network. The bottom and 
top of the lower whiskers, the dot, and the bottom and top 
of the upper whiskers correspond to the 0, 25, 50, 75, and 
fOOth percentiles, respectively. The dashed line indicates a 
scaling relationship w ~ b'' with 7 « 0.2. Although the net- 
work exhibits a positive correlation between link weight and 
link betweenness, the high-salience skeleton incorporates links 
with weights spanning the entire range of observed values; no 
clear correlation of weight with salience exists. These prop- 
erties are observed in the other networks as well. 



nition shorter and tend to attract shortest paths. In con- 
trast, link weights exhibit no systematic dependence on 
salience, and in particular large weights do not generally 
imply large salience. In fact, for fixed link salience the 
distribution of weights is broad with approximately the 
same median. Consequently, salience can be considered 
an independent centrality dimension that measures dif- 
ferent features than correlated centrality measures such 
as weight and betweenness. 



D. Origin of bimodal salience 

All the networks we consider feature broad link weight 
distributions p{w) (see Figure [T|3), some of which can be 
reasonably modeled by power laws p{'w) ~ w~^°'^^^ with 
exponents for many empirical data sets typically in the 
range 1 < a < 3 f29^ (smaller a corresponds to broader 
p{w)). Although it may seem plausible that strong links 
in the tail of these distributions dominate the structure of 
shortest-path trees and thus cause the characteristic bi- 
modal distribution of link salience, evidence against this 
hypothesis is already apparent in Fig[3|l: links with high 
salience exhibit weights across many scales, and in par- 
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Figure 4: Salience in random networks, (a) Salience dis- 
tributions p(s) in fully connected networks with 1,000 nodes 
and weights assigned using a power law p{w) ~ w"'^"*""' for 
various tail exponents a. Complete networks serve as mod- 
els of systems with all-to-all interactions, such as the inter- 
industry trade network. Only for unrealistically broad weight 
distributions (a < 1) does p{s) exhibit a bimodal character. 
If a > 1 bimodality is absent. (b) Salience distributions 
in preferential attachment networks (1,000 nodes) [23] with 
degree distribution p{k) ~ and uniform weights do not 
exhibit bimodal salience (heavy dashed line). If however the 
power-law weight distribution is superimposed on the pref- 
erential attachment topology, bimodal salience emerges for 
realistic values of a. (c) For the range of tail exponents a 
and /3 the color code quantifies the magnitude x of bimodal- 
ity in the salience distribution Pa.p{s) of a network with a 
scale-free degree distribution with exponent /3 (constructed 
using the configuration model [4!) and a scale-free weight dis- 
tribution with exponent a. Small values of x correspond to 
a bimodal Pa,/3(s). The bimodality measure x was computed 
using Kolmogorov-Smirnov distance between Pa,^(s) for s > 
and the idealized reference distribution q{s) = 5{s — 1). 



ticular low-weight links may possess high salience. Fur- 
ther evidence is provided in Figure [4|l, which depicts the 
salience distribution for fully connected networks for a se- 
quence of tail parameters a. For values of a in the range 
observed in real networks, p{s) is peaked near s = and 
decreases with increasing s. A bimodal distribution of s 
only emerges when a is unrealistically small (a < 1), and 
is much less pronounced than in real networks (cf. Fig- 
ure |2|. We conclude that broad, scale-free weight distri- 
butions p^w) alone are insufficient to cause the natural, 
bimodal distribution p{s) observed in real networks. 

Another potential source of the observed bimodal- 
ity in p{s) is the topological heterogeneity of a scale- 
free degree distribution p{k) ~ with < /? < 
2 |22| [231 143] . Figure |4|3 provides evidence that also a 
scale-free topology alone does not yield the characteristic 
bimodal salience distribution. In fact, the generic pref- 



erential attachment network (/3 = 2) with uniform 
weights exhibits a distribution of salience that is almost 
the complement of the observed pattern with mostly in- 
termediate values of link salience. The presence of hubs 
implies that any shortest paths seeking out a node in 
a hub's region will most likely route through that hub, 
and links emanating from this hub are more likely to 
appear in many shortest-path trees. However, the hub- 
and-spoke structure of a preferential attachment network 
is only approximate; nodes that are at the end of a spoke 
are still likely to have random links to other areas of the 
network. For this reason, it is not typical in the uniform- 
weight preferential attachment network to find links that 
appear in nearly all shortest-path trees. 

However, the observed bimodal distribution p{s) can 
be generated in random networks by a combination of 
weight and degree variability, a property characteristic of 
the class of networks discussed here. Figure |4]3 also de- 
picts p{s) for preferential attachment networks that pos- 
sess a scale-free distribution of both degree k and weight 
w. As the weight distribution becomes broader (decreas- 
ing a), and even in the absence of explicit degree- weight 
correlations, we see the emergence of bimodality in the 
salience distribution in these networks. Topological hubs 
are more likely to have extremely high-weight links sim- 
ply because they have more links. Even when there is a 
topologically short path terminating at a spoke node that 
does not pass through the corresponding hub, it is less 
likely to be the shortest weighted path. Extreme weights 
amplify the effects of hubs by drawing more shortest 
paths through them. Moreover, Figure |4j: demonstrates 
that the emergence of bimodal salience does depend on 
the interplay between degree and weight distributions: 
the broader the degree distribution, the narrower the re- 
quired weight distribution. 

All of these results support the conclusion that a bi- 
modal salience distribution is characteristic of networks 
with strong heterogeneity in both topology and interac- 
tion strength, but that unweighted networks do not ex- 
hibit this property. 



E. Applications to network dynamical systems 

The relevance of link salience to dynamical processes 
that evolve on networks is an important issue, and one 
area of particular interest in network research is conta- 
gion phenomena. In this context, individuals in a popula- 
tion are represented by nodes, and interaction propensi- 
ties between pairs of nodes by a weighted network. Con- 
tagion phenomena are modeled by transmissions between 
nodes along the links of the network, where the likeli- 
hood of transmission is quantified by the link weights. 
The central question in this class of models is how the 
topological properties of the network shape the dynam- 
ics of the process. Link salience can also provide useful 
information about the behavior of such a dynamical sys- 
tem. To illustrate this, we consider a simple stochastic SI 
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epidemic model. At any given point in time, an infected 
node i can transmit a disease to susceptible nodes at a 
rate determined by the link weight Wji. The details of the 
model are provided in the Methods. We consider an epi- 
demic on a planar disk network similar to that shown in 
Figure [3^. A single node is chosen at random for the out- 
break location. At every step of the process each infected 
node randomly selects a neighbor to infect with probabil- 
ity proportional to the link weight; eventually the entire 
network is infected. By keeping track of which links were 
used in the infection process one obtains the infection hi- 
erarchy if, a directed tree structure that represents the 
epidemic pathway through the network. Since the pro- 
cess is stochastic, each realization of the process gener- 
ates a different infection hierarchy. For different initial 
outbreak nodes and realizations of the process we calcu- 
late an infection frequency h for each link: The number 
of times that link is used in the infection process, nor- 
malized by the number of realizations. The question is, 
how successfully can link salience, a topological quantity, 
predict infection frequency h, a dynamic quantity. Fig- 
ure [5] shows the results for the two different link weight 
scenarios described in the Methods. The top panel shows 
networks with link weights narrowly and uniformly dis- 
tributed around a constant value wq; in the bottom panel 
link weights are broadly distributed according to a power 
law. In both cases, link salience is highly correlated with 
the frequency of a link's appearance in infection hierar- 
chies h, while alternative link centrality measures such as 
weight and betweenness are not (see Figure [5] insets and 
SI) . The link salience on average gives a much more accu- 
rate prediction of the virulence of a link than other avail- 
able measures of centrality, suggesting that this type of 
completely deterministic, static analysis could nonethe- 
less play an important role in considering how best to 
slow spreading processes in real networks. 



III. DISCUSSION 



Figure 5: Salience predicts infection pathways in 
stochastic epidemic models. The scatter plots show the 
directed salience Sd against the normalized frequency of ap- 
pearance in infection pathways h for each link in an ensemble 
of 100 networks, averaged over 1,000 epidemic realizations 
for each member of the ensemble. As in Figure [3] the plots 
are divided horizontally into bins, with the heavy black lines 
indicating quartiles within each bin. Insets show link bet- 
weenness b versus h, and correlation coefficients are listed in 
Table [III] top, Weights distributed narrowly and uniformly 
around a constant wo. bottom, Weights distributed accord- 
ing to p{w) ~ w"'"'^"'""' with a = 2. 



As much recent work in network theory has shown [19J 
EUl 1551 [51] , there is tremendous potential for extracting 
heretofore hidden information from the complex interac- 
tions between the elements of a system. However, un- 
til now these methods have relied on externally imposed 
parameters or null models. Here we have shown that 
typical empirical networks taken from a variety of fields 
do in fact permit the robust classification of links ac- 
cording to the node-consensus procedure we introduce, 
and that this leads naturally to the definition of a high- 
salience skeleton in these networks. Because vanishingly 
few links in empirical networks have intermediate values 
of salience, the identification of the skeleton is insensi- 
tive to a salience threshold; indeed, if a tunable filtering 
procedure is desired other methods may be more appro- 
priate. Not all networks possess a skeleton; simple un- 
weighted models have a shortest-path structure spread 
throughout the links. However, the presence of a skeleton 



is a generic feature of many heterogeneously weighted, 
empirical networks. We suggest that the Hkely cause in 
real networks is a hub-and-spoke topological structure 
along with a broad weight distribution, which amplifies 
the tendency of hubs to capture shortest paths. 

We believe that the concept of salience and the high- 
salience skeleton will become a vital component in un- 
derstanding networks of the type discussed here and the 
development of network-based dynamical models. The 
simple SI model we investigate here is only a starting 
point; it may be possible to leverage knowledge of a net- 
work's high-salience skeleton to develop dynamical mod- 
els that do not require simulation on (or even knowledge 
of) the full network. The generic bimodal salience dis- 
tribution in this context also implies that in contagion 
phenomena only a small subset of links might typically 
be active even if the process is stochastic. Those links, 
however, are almost certainly active irrespective of the 
outbreak location and the stochasticity of the process, 
which implies that in this regime the process becomes 
more predictable and the impact of stochasticity is de- 
creased. This effect may shed a new light on the impact 
of stochastic factors in disease dynamical processes that 
evolve in strongly heterogeneous networks. 

Many of the networks we considered evolved over 
long periods of time subject to external constraints and 
unknown optimization principles. The discovery that 
pronounced weight and degree heterogeneity, which are 
defining properties of the investigated networks, go hand 
in hand with generic properties in their underlying skele- 
ton indicate that looking for common evolution principles 
could be another promising direction of further research. 



IV. METHODS 
A. Network data sources 

Table |ll] gives a brief definition of each network we 
examine here, and below we provide a summary of the 
networks along with data sources and references. 

The Cash fiow network was constructed from data col- 
lected through the Where's George bill-tracking website 
(http://www.wheresgeorge.com). The nodes are the 
3,106 counties in the 48 United States excluding Alaska 
and Hawaii, and the links measure the number of bills 
passing between pairs of counties per time. This network 
has been previously analyzed [SJIinilSS]; see in particular 
the supplement to for a wealth of detailed information 
regarding the construction and statistics of this network, 
as well as strong evidence for interpreting it as proxy 
for individual mobility. The network of cash fiow is con- 
structed from approximately 10 million individual bank 
notes that circulate in the United States. 

The Air traffic network measures global air traffic 
based on fiight data provided by OAG Worldwide Ltd. 
(http://www.oag.com) and includes all scheduled com- 
mercial fiights in the world. Nodes represent airports 



worldwide. Link weights measures the total number of 
passengers traveling between a pair of networks by direct 
fiights per year. This network is well-represented in the 
literature [5", 'SI US) |13J gl] ; we reduce it to 95% fiux as 
described in |i45]. Total traffic in this network amounts 
to approximately 3 billion passengers per year. 

The Shipping network quantifies international 
marine freight traffic based on data provided by 
IHS Fairplay ( http://www. ihs .com/products/] 
maritime- information/ index. aspx) which in- 
cludes itineraries for 16,363 container ships. Nodes 
represent ports, and links measure the number of 
commercial cargo vessels traveling between those 
ports during 2007. The network is available at 
http://www.mathmod.icbm.de/45365.html and fur- 
ther discussion can be found in [46,. 

The Commuting network is based on surveys 
conducted by the US Census Bureau during the 
2000 census, and refiects the daily commuter traf- 
fic between US counties; the data is publicly 
available at http : //www. census . gov/populat ion/www/" 
cen2000/commuting/f iles/2KRESC0_US.zip, Nodes in 
this network represent the counties of the 48 states ex- 
cluding Alaska and Hawaii, and links measure the num- 
ber of people commuting between pairs of counties per 
day. 

The Neural network is derived from the Caenorhah- 
ditis elegans nematode. Nodes represent neurons, and 
links measure the number of synapses or gap junctions 
connecting a pair of neurons. Experimental data is de- 
scribed in Ref. ^\ and analyzed in Ref. [48]; the net- 
work is available at http : //www -personal . umich . eduT] 
~me jn/netdata/ 

The metabolic network measures interactions in the 
bacterium Escherichia coli |161 I49| . Nodes represent 
metabolites and links measure effective kinetic rates of 
reactions a pair of metabolites participates in. We use 
only the largest connected component of this network. 

The Food web network is a representative food 
web from a list of publicly available data sets 
of the same type (see http : //vlado . fmf .uni-lj . ' 
si/pub/networks/data/bio/f oodweb/f oodweb . htm for 
networks in Pajek format, a report [50 on trophic analy- 
sis of the Florida Bay food web available at http : //wwwj] 
cbl .umces . edu/~atlss/FBay701 .htmll and Refs. |3ff 



35| ]. Nodes represent species in the Florida Bay ecosys- 
tem, and links measure the consumed biomass in grams 
of carbon per year across a link. 

In the Inter-industry network, nodes represent indus- 
trial sectors in the United States and their connections 
are computed from input-output tables prepared by the 
US Bureau of Economic Analysis available at http : " 
//www. bea.gov/industry/io_benchmark. htm. We use 
data from 2002, the most recent year for which mea- 
surements are available. Nodes in this network repre- 
sent particular industries (for example, "tobacco produc- 
tion" or "cutlery and hand tool manufacturing") and links 
measure an average interaction between two industries. 



Network Nodes 



Link units 



Counties, continental United States Number of bills/time 

Airports, worldwide Number of passengers/time 

Ports, worldwide Number of cargo ships/time 

Counties, continental United States Number of commuters/times 



Cash flow 
Air traffic 
Shipping 
Commuting 

Neural Neurons, C. elegans 

Metabolic Metabolites, E. coli 
Food web Species, Florida Bay food web 

Inter-industry Industrial sectors. United States 
World trade Countries 
Collaboration Scientists 



Number of synapses and gap junctions 
Effective kinetic reaction rate 
Exchanged biomass/time 

Average input required for fixed output (USD) 
Average value of traded assets/time (USD) 
Number of co-authored papers 



Table II: Definition of nodes and links in empirical networks. The entities represented by nodes, as well as the units 
measured by link weight, are listed for every network. 



Given two industries x and y, input-output data mea- 
sures the amount (USD) of input x demands from y in 
order to produce one dollar of output, and we take the 
weight of the link connecting x and y to be the geometric 
mean of the input-output demand of x on y and y on x. 

The World trade network is based on data prepared by 
the United States National Bureau of Economic Research 
and measures the value (in nominal thousands of USD) of 
goods traded between countries from 1962-2000. Nodes 
represent countries and links measure the value of goods 
traded between countries. The data and extensive docu- 
mentation are available at http://cid.econ.ucdavis.| 
[edu/data/undata/undata.html, A series of papers ana- 
lyzes a similar data set from a different source [35, .51, - 53] . 

The Collaboration network is based on co-authorship 
of academic papers in the high-energy physics commu- 
nity from 1995-1999. Nodes represent individuals and 
links measure the number of papers co-authored |54| . 
The data is publicly available at http : //www-personal . | 
[umich. edu/~mejn/netdata/ 



B. Link salience and betweenness centrality 

Link salience s and betweenness centrality b are based 
on the notion of shortest paths in weighted networks. 
Given a weighted network defined by the weight matrix 
Wij (not necessarily symmetric) and a shortest path that 
originates at node x and terminates at node y it is con- 
venient to define the indicator function 

1 if link i ^ j is on the shortest path 
aij{y,x) — < from x to y 

otherwise 

A shortest path tree T{x) rooted at node x can be rep- 
resented as a matrix with elements 



and salience Sij of link i 



Tij{x) = 



1 if J2y'^^3iy^^) > 

otherwise, 



j is given by 
T„(x) = {T,j{x))y (4) 

y denotes the average across the set of root 



N ^ 



where ( • 
nodes x. 

Betweenness, on the other hand, is defined according 

to 



iV2 



1/2 



where 



/y2 



denotes the average over all pairs of ter- 



minal nodes. The relation of betweenness and salience 
can be made more transparent by rewriting this expec- 
tation value as a sequential average over all nodes, 

1 



N ^ 



h^{x) 



with 



fixing root node x. Thus bij{x) is the conditional bet- 
weenness of link i — > j if the set of shortest paths is 
restricted to those terminating at x. From this it follows 
that 

hj = {{(Ttj{x,y))y)y (5) 

Comparing ([s]) with ^ we see that the difference of 
salience and betweenness is equivalent to the difference 
in the shortest path trees Tij (x) and the conditional bet- 
weenness bij{x). Whereas all links in the shortest path 
tree are weighted equally, links with non-zero conditional 
betweenness tend to become less central as the links be- 
come further separated from the root node x. Formally 
we can write 



with Q{x) 



s^j = (e [{<j,j{x,y))y])y 
1 if x > and Q{x) — otherwise. 



(6) 



C. Epidemic simulations 

In order to determine the relevance of link salience to 
contagion phenomena on networks, we investigated the 
correlation of link salience and the frequency at which 
links participate in a generic contagion process that 
spreads through planar, random triangular networks. 

Each network consists of = 100 nodes distributed 
uniformly at random in a planar disk; the links of the 
network are given by the Delaunay triangulation of the 
nodes. The planar distance between nodes is roughly 
proportional to the number of links in a shortest (net- 
work) path between them. A representative example of 
this type of topology is shown in Figure 3a. We consider 
two different weight scenarios: 

1. Quasi-homogeneous weights: Each link is assigned 
a unit weight w modified by an additive, small per- 
turbation ^ 

w = l+^ 

where ^ is uniformly distributed in the interval 
[-0.01,0.01] 

2. Broadly distributed weights: Each link is assigned 
a random weight from the distribution with PDF 

p{w) ^ . 

We simulate a stochastic Susceptible-Infected (SI) epi- 
demic process. A single stochastic realization of the pro- 
cess is generated as follows: Given a network represented 
by the symmetric weight matrix Wij which quantifies the 
interaction strength of a pair of nodes, we define the prob- 
ability Pij that node j infects node i in a fixed time in- 
terval At 

Pij = IPzj i 7^ j- 



where 7 <^ 1/A< is the infection rate, and pij = 
Wij / Wij. Time proceeds in discrete steps; at each 
step each infected node j chooses an adjacent node to 
infect at random with probabilities given by Pij . If node 
j infects a susceptible node i, then the link (j, i) is added 
to the infection hierarchy H, which can be represented 
as a matrix Hji. In the long time limit every node is 
infected, and _ff is a tree structure recording the first in- 
fection paths from the outbreak location s to every other 
node. 

For a given network, we compute R — 1,000 different 
epidemic realizations with random outbreak locations , 

(k) 

resulting in an ensemble of infection hierarchies Hmn- 
The key question is, how frequently does a link in the 
network participate in an epidemic, and we define the 
infection frequency of a link as 



"■mn ^ / , ^^mn 
fe=l 

We compute the infection frequency for 100 random net- 
works under each weight scenario, and Figure [Sjillustrates 
the degree to which the directed salience Smn is a predic- 
tor of the dynamic quantity hmn- The correlation of hmn 
with directed salience and the two measures of centrality 
we consider here, weight w„m and betweenness 6„m, is 
shown in Table Hill 

Weight scenario Sd vs /i b vs h w ys h 

Homogeneous 0.734 0.0756 0.005 45 
Broad 0.803 0.329 0.393 



Table III: Correlation of otlier measures witli infection 
frequency. The Pearson correlation coefficients of salience 
Sd, betweenness b, and weight w with infection pathway fre- 
quency h are shown. 
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